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Abstract 

Infinite random sequences of letters can be viewed as stochastic chains or as strings 
produced by a source, in the sense of information tlieory. The relationship between 
Variable Length Markov Chains (VLMC) and probabilistic dynamical sources is stud- 
ied. We establish a probabilistic frame for context trees and VLMC and we prove that 
any VLMC is a dynamical source for which we explicitly build the mapping. On two 
examples, the "comb" and the "bamboo blossom", we find a necessary and sufficient 
condition for the existence and the unicity of a stationary probability measure for the 
VLMC. These two examples are detailed in order to provide the associated Dirichlet 
series as well as the generating functions of word occurrences. 
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1 Introduction 

Our objects of interest are infinite random sequences of letters. One can imagine 
DNA sequences (the letters are A, C, G, T), bits sequences (the letters are 0, 1) 
or any random sequence on a finite alphabet. Such a sequence can be viewed as 
a stochastic chain or as a string produced by a source, in the sense of information 
theory. We study this relation for the so-called Variable Length Markov Chains 
(VLMC). 

From now on, we are given a finite alphabet A. An infinite random sequence of 
letters is often considered as a chain {Xn)n£Z, ^-c an ^^-valued random variable. 
The Xn are the letters of the chain. Equivalently such a chain can be viewed as 
a random process (f/„)„(=N that takes values in the set C := A~^ of left-infinite 
wordqj and that grows by addition of a letter on the right at each step of discrete 
time. The £-valued processes we consider are Markovian ones. The evolution 
from Un = ■ ■ ■ X_iXoXi . . . X„ to f/„+i = t/n^n+i is described by the transition 
probabilities P{Un+i = f/„a|f/„), a E A. 

In the context of chains, the point of view has mainly been a statistical one until 



now, going back to Harris [Ij] who speaks of chains of infinite order to express 
the fact that the production of a new letter depends on a finite but unbounded 
number of previous letters. Comets et al. |7| and Gallo and Garcia ll| deal with 



chains of infinite memory. Rissanen [23[ introduces a class of models where the 



transition from the word t/„ to the word f/n+i = UnXn+i depends on f/„ through 
a finite suffix of Un and he calls this relevant part of the past a context. Contexts 
can be stored as the leaves of a so-called context tree so that the model is entirely 
defined by a family of probability distributions indexed by the leaves of a context 
tree. In this paper, Rissanen develops a near optimal universal data compression 
algorithm for long strings generated by non independent information sources. 
The name VLMC is due to Biihlmann and Wyner [5|]. It emphasizes the fact 
that the length of memory needed to predict the next letter is a not necessarily 
bounded function of the sequence f/„. An overview on VLMC can be found in 



Calves and Locherbach 12 . 



We give in Section [2] a complete probabilistic definition of VLMC. Let us present 
here a foretaste, relying on the particular form of the transition probabilities 
P(f/n+i = Una\Un)- Let T be a saturated tree on A, which means that every 
internal node of the tree -i.e. a word on ^ - has exactly |^| children. With each 
leaf c of the tree, also called a context, is associated a probability distribution Qc 
on A. The basic fact is that any left-infinite sequence can thus be "plugged in" a 



^In the whole text, N denotes the set of nonncgativc integers. 



unique context of the tree T: any [/„ can be uniquely written f/„ 



c, where, 



for any word c = ai ■ ■ ■ a^, c denotes the reversed word c = on ■ ■ ■ ai- In other 
terms, for any n, there is a unique context c in the tree T such that c is a suffix 
of Un] this word is denoted by c = pref (f/„). We define the VLMC associated 
with these data as the £-valued homogeneous Markov process whose transition 
probabihties are, for any letter a E A, 



P{U, 



n+l 



Una\Un) = g^f {[/„)(")• 



When the tree is finite, the final letter process (X„)„>o is an ordinary Markov 
chain whose order is the height of the tree. The case of infinite trees is more 
interesting, providing concrete examples of non Markov chains. 




goii 



Q'oioo 



•Joioi 




Figure 1: example of probabilized context tree (on the left) and its corresponding 
dynamical system (on the right). 

In the example of Figure [H the context tree is finite of height 4 and, for instance, 
P{Un+i = UnO\Un = ■■■0101110) = goii(O) because pFef(^ ■ ■0101110) = Oil 
(read the word ■ ■ ■ 0101110 right-to-left and stop when finding a context). 

In information theory, one considers that words are produced by a probabilistic 
source as developed in Vallee and her group papers (see Clement et al. |6| for an 
overview). In particular, a probabilistic dynamical source is defined by a coding 
function p : [0, 1] — t- ^, a mapping T : [0, 1] — ?> [0, 1] having suitable properties 
and a probability measure fi on [0, 1]. These data being given, the dynamical 



source produces the ^-valued random process (Yn)neN '■= (p(^"'0)nGN, where ^ is 
a /^-distributed random variable on [0, 1]. On the right side of Figured! one can 
see the graph of some T, a subdivision of [0, 1] in two subintervals /q = p~^(0) 
and /i = p~^(l) and the first three real numbers x, Tx and T'^x, where x is a 
realization of the random variable ^. The right-infinite word corresponding to 
this example has 010 as a prefix. 

We prove in Theorem 13.221 that every stationary VLMC is a dynamical source. 
More precisely, given a stationary VLMC, {Un)nm say, we construct explicitly 
a dynamical source (Yn)n&n such that the letter processes {Xn)nm and (Yn)neN 
are symmetrically distributed, which means that for any finite word w of length 
N + 1, P{Xo...Xn = w) = P{Yo...Yn = w). In Figure [H the dynamical 
system together with Lebesgue measure on [0, 1] define a probabilistic source that 
corresponds to the stationary VLMC defined by the drawn probabilized context 
tree. 

The previous result is possible only when the VLMC is stationary. The question 
of existence and unicity of a stationary distribution arises naturally. We give a 
complete answer in two particular cases (Proposition 14.231 and Proposition 14.321 
in Section HI) and we propose some tracks for the general case. Our two examples 
are called the "infinite comb" and the "bamboo blossom" ; they can be visualized 
in Figures [6] and U\ respectively page [20] and page [3TJ Both have an infinite 
branch so that the letter process of the VLMC is non Markovian. They provide 
quite concrete cases of infinite order chains where the study can be completely 
handled. We first exhibit a necessary and sufficient condition for existence and 
unicity of a stationary measure. Then the dynamical system is explicitly built 
and drawn. In particular, for some suitable data values, one gets in this way 
examples of intermittent sources. 

Quantifying and visualizing repetitions of patterns is another natural question 
arising in combinatorics on words. Tries, sujfix tries and digital search trees are 
usual convenient tools. The analysis of such structures relies on the generating 
functions of the word occurrences and on the Dirichlet series attached to the 
sources. In both examples, these computations are performed. 

The paper is organized as follows. Section [2] is devoted to the precise definition 
of variable length Markov chains. In Section |3] the main result Theorem 13.221 is 
established. In Section HI we complete the paper with our two detailed examples: 
"infinite comb" and "bamboo blossom" . The last section gathers some prospects 
and open problems. 



2 Context trees and variable length Markov chains 

In this section, we first define probabilized context trees; then we associate with a 
probabihzed context tree a so-called variable length Markov chain (VLMC). 

2.1 Words and context trees 

Let ^ be a finite alphabet, i.e. a finite ordered set. Its cardinality is denoted by 
1^1 . For the sake of shortness, our results in the paper are given for the alphabet 
A = {0, 1} but they remain true for any finite alphabet. Let 

n>0 

be the set of all finite words over A. The concatenation of two words v = vi . . . vm 
and w = Wi . . . wn is vw = Vi . . . vmWi . . . wat. The empty word is denoted by 0. 
Let 

C = A^'' 

be the set of left-infinite sequences over A and 

be the set of right-infinite sequences over A. If /c is a nonnegative integer and if 
w = a-k ■ ■ ■ tto is any finite word on A, the reversed word is denoted by 

w = Oq- ■ • Ot_k. 

The cylinder based on w is defined as the set of all left-infinite sequences having 
1/7 as a suffix: 

Cw = {s G C, Vj G {—k, ■ ■ ■ , 0}, Sj = aj}. 

By extension, the reversed sequence of s = ■ ■ ■ a_iao G £ is s = ao^-i ■ ■ ■ G 7^. 
The set £ is equipped with the a-algebra generated by all cylinders based on 
finite words. The set TZ is equipped with the a-algebra generated by all cylinders 
wTZ = {r E TZ, w is a prefix of r}. 

Let T be a tree, i. e. a subset of W satisfying two conditions: 

• dseT 

• Vn, V G W, uv G T =^ u E T. 



This corresponds to the definition of rooted planar trees in algorithmics. Let 
C^{T) be the set oi finite leaves of T, i.e. the nodes of T without any descendant: 

C^{r) = {ueT,yjeA,uj^T}. 

An infinite word u &TZ such that any finite prefix of u belongs to T is called an 
infinite leaf of T. Let us denote the set of infinite leaves of T by 

C\T) = {u e 7^, Vi; prefix oiu,v e T}. 

Let C{T) = C^{T) U C^{T) be the set of all leaves of T. The set T\C^{T) is 
constituted by the internal nodes of T. When there is no ambiguity, T is omitted 
and we simply write C, C^ and C^ . 

Definition 2.1 A tree is saturated when each internal node w has exactly \A\ 
children, namely the set {wa, a C A} C T. 

Definition 2.2 (Context tree) 

A context tree is a saturated tree having a finite or denumerahle set of leaves. 
The leaves are called contexts. 

Definition 2.3 (Probabilized context tree) 

A probabilized context tree is a pair 

(T, (gc)cec(r)) 

where T is a context tree over A and {qc)ceC{T) ^s a family of probability measures 
on A, indexed by the denumerable set C{T) of all leaves ofT. 

Example 2.4 See FigureUlfor an example of finite probabilized context tree with 
five contexts. See Figure for an example of infinite probabilized context tree, 
called the infinite comb. 

Definition 2.5 A subset JC of W U TZ is a cutset of the complete \A\-ary tree 

when both following conditions hold 

(i) no word of IC is a prefix of another word of /C 

(a) \fr G IZ, 3u E IC, u prefix of r. 

Condition (i) entails unicity in (ii). Obviously a tree T is saturated if and only 
if the set of its leaves C is a cutset. Take a saturated tree, then 

Vr E 7^, either r E C or 3!n E W, u E C , u prefix of r. (1) 



This can also be said on left-infinite sequences: 

Vs G £, either s G C^ or 3\w G W, w e C^ , w suffix of s. (2) 

In other words: 

^= UMU U Cw. (3) 

This partition of £ will be extensively used in the sequel. Both cutset properties 
([1]) and ([2]) will be used in the paper, on TZ for trees, on £ for chains. Both orders 
of reading will be needed. 

Definition 2.6 (Prefix function) Let T be a saturated tree and C its set of 

contexts. For any s E C, pref (s) denotes the unique context ai . . . a^v such that 
s = . . . Un ...«!. The map 

pref : C ^ C 

is called the prefix function. For technical reasons, this function is extended to 

pief : £ U W ^ r 
in the following way: 

• ifwET then pref (u;) = W; 

• if w E W \T then pref (u;) is the unique context ai . . . a^ such that w has 
Un . . . «! as a suffix. 

Note that the second item of the definition is also valid when w E C. Moreover 
pref {w) is always a context except when w is an internal node. 

2.2 VLMC associated with a context tree 

Definition 2.7 (VLMC) 

Let (T, (g'c)cGc) be a probabilized context tree. The associated Variable Length 
Markov Chain (VLMC) is the order 1 Markov chain {Un)n>o with state space C, 
defined by the transition probabilities 

Vn > 0, Va G ^, P (Un+i = f/„,«|f/„,) = q^^i (t/„) («) • (4) 

Remark 2.8 ^45 usually, we speak of the Markov chain defined by the transition 
probabilities ^, because these data together with the distribution of Uq define a 
unique C-valued random process (see for example Revuz }2H]). 



The rightmost letter of the sequence [/„, G C will be denoted by X„ so that 

Vn > 0, t/„+i = f/„X„+i. 

The final letter process (X„)„>o is not Markov as soon as the context tree has at 
least one infinite context. As already mentioned in the introduction, when the 
tree is finite, (X„)„>o is a Markov chain whose order is the height of the tree, 
i.e. the length of its longest branch. The vocable VLMC is somehow confusing 
but commonly used. 

Definition 2.9 (SVLMC) Let iUn)n>o be a VLMC. When a stationary prob- 
ability measure on C exists and when it is the initial distribution, we say that 
{Un)n>o is a Stationary Variable Length Markov Chain (SVLMC). 

Remark 2.10 In the literature, the name VLMC is usually applied to the chain 
{Xn)nez- There exists a natural bijective correspondence between A-valued chains 
{Xn)nez o,nd C-valued processes {Un = UqXi . . . X„, n > 0) . Consequently, finding 
a stationary probability for the chain {Xn)nez is equivalent to finding a stationary 
probability for the process (f/„)„>o- 

3 Stationary variable length Markov chains 

The existence and the unicity of a stationary measure for two examples of VLMC 
will be established in Section HI In the present section, we assume that a sta- 
tionary measure tt on £ exists and we consider a vr-distributed VLMC. In the 
preliminary Section 13. ![ we show how the stationary probability of finite words 
can be expressed as a function of the data and the values of vr at the tree nodes. 
In Section 13.21 the main theorem is proved. 

3.1 General facts on stationary probability measures 

For the sake of shortness, when vr is a stationary probability for a VLMC, we 
write 7r{w) instead of tt{Cw), for any w G W: 

7r{w) = P(t/o G Cw) = P(X_(|^|„i) ...Xo = w). (5) 

Extension of notation g„ for internal nodes. 

The VLMC is defined by its context tree T together with a family (q'c)cGC of 
probability measures on A indexed by the contexts of the tree. When u is an 



internal node of the context tree, we extend the notation g„ by 

-^ if 7r{u) ^ 

if 7r(n) = 

for any a E A. Thus, in any case, vr being stationary, 7i(ua) = 7r(n)g„(a) as soon 
as u is an internal node of the context tree. With this notation, the stationary 
probability of any cylinder can be expressed by the following simple Formula (|8]). 

Lemma 3.11 Consider a SVLMC defined by a probabilized context tree and let 
n denote any stationary probability measure on C. Then, 
(i) for any finite word u; G W and for any letter a E A, 

i^{wa) = n{w)q^ei {w)ia)- (7) 

(ii) For any finite word w = ai . . . a^ G W, 



N-l 
k=0 



II g^f{ai...afe)(afc+l) (8) 



(if k = 0, tti . . . ttfc denotes the empty word 0, pref(0) = 0, ^^(a) = 7r(a) and 
7r(0) = n{C) = 1). 

Proof, (i) If w is an internal node of the context tree, then pref (w) = w and the 
formula comes directly from the definition of qjjj. If not, 7r{wa) = it{Ui G Cwa) 
by stationarity; because of Markov property, 

ii{wa) = P{Uo G Cw)P{Ui G Cwa\Uo G Cw) = vr(u7)g^f (^)(a). 

Finally, (ii) follows from (i) by a straightforward induction. ■ 

Remark 3.12 When A = {0, 1} and n is any stationary probability of a SVLMC, 
then, for any natural number n, 7r(10") = 7r(0"l). Indeed, on one hand, by 
disjoint union, 7r(0") = 7r(0"~^"'^) + 7r(10"'). On the other hand, by stationarity, 

7r(0") = P(Xi...X„ = 0") = P(Xo...X„„i = 0") 

= P(Xo . . . X„ = 0"+^) + P(Xo ...Xn = 0"1) = 7r(0"+^) + 7r(0"l). 

These equalities lead to the result. Of course, symmetrically, 7r(0r") = 7r(l"0) 
under the same assumptions. 
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3.2 Dynamical system associated with a VLMC 

We begin with a general presentation of a probabilistic dynamical source in Sec- 
tion 13.2. 1[ Then we build step by step partitions of the interval [0, 1] (Sec- 
tion I3.2.2P and a mapping (Section I3.2.3P based on the stationary measure of 
a given SVLMC. It appears in Section 13.2.41 that this particular mapping keeps 
Lebesgue measure invariant. All these arguments combine to provide in the last 
Section 13.2.51 the proof of Theorem 13.221 which allows us to see a VLMC as a 
dynamical source. 

In the whole section, I stands for the real interval [0, 1] and the Lebesgue measure 
of a Borelian J will be denoted by \J\. 

3.2.1 General probabilistic dynamical sources 

Let us present here the classical formalism of probabilistic dynamical sources (see 
Clement et al. J6|). It is defined by four elements: 



• a topological partition of / by intervals {Ia)aeAi 

• a coding function p : I ^ A, such that, for each letter a, the restriction of 
p to la is equal to a, 

• a mapping T : I ^ I, 

• a probability measure p on I. 

Such a dynamical source defines an ^-valued random process {Yn)n<=N as follows. 
Pick a random real number x according to the measure p. The mapping T yields 
the orbit (x, T(x), T^(x), . . .) of x. Thanks to the coding function, this defines the 
right-infinite sequence p(x)p(T(x))p(T^(x)) ■ • • whose letters are Yn := p(T"'{x)) 
(see Figure [2]). 
For any finite word w = ao . . . ajq G W, let 



N 

Ok 



B^ = f] T-'^I, 



fc=0 

be the Borelian set of real numbers x such that the sequence (Yn)nm has w as a 
prefix. Consequently, the probability that the source emits a sequence of symbols 
starting with the pattern w is equal to p{B^). When the initial probability 
measure /x on J is T- invariant, the dynamical source generates a stationary A- 
valued random process which means that for any n G N, the random variable Y^ 
is //-distributed. 
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Figure 2: the graph of a mapping T, the intervals /q and Ji that code the interval 
/ by the alphabet A = {0, 1} and the first three points of the orbit of an x G / 
by the corresponding dynamical system. 

The following classical examples often appear in the literature: let p g]0, 1[, 

Jo = [0, 1 — p] and Ji =]1 — p, !]• Let T : / — )■ / be the only function which 

maps linearly and increasingly Iq and Ji onto / (see Figure [3] when p = 0.65, 

left side). Then, starting from Lebesgue measure, the corresponding probabilistic 

djTiamical source is Bernoulli: the Y^ are i.i.d. and P(lo = 1) = P- Iii the same 

vein, if T is the mapping drawn on the right side of Figure [31 starting from 

Lebesgue measure, the {0, l}-valued process (F„)neN is Markov and stationary, 

with transition matrix 

' 0.4 0.6 

0.7 0.3 

The assertions on both examples are consequences of Thales theorem. These two 
basic examples are particular cases of Theorem 13.221 

3.2.2 Ordered subdivisions and ordered partitions of the interval 

Definition 3.13 A family {Iw)wew of subintervals of I indexed by all finite words 

is said to be an ^-adic subdivision of I whenever 

(i) for any w G W, Iw is the disjoint union of I^a, a G A; 
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Figure 3: mappings generating a Bernoulli source and a stationary Markov chain 
of order 1. In both cases, Lebesgue measure is the initial one. 



(a) for any f,u7 G W, if v < w for the alphabetical order, then 

Vx e 4, Vy e lyj, X < y. 

Remark 3.14 For any integer p > 2, the usual p-adic subdivision of I is a 
particular case of A-adic subdivision for which \A\ = p and \Iw\ = p"'"'' for any 
finite word w G W. For a general A-adic subdivision, the intervals associated 
with two k-length words need not have the same length. 

The inclusion relations between the subintervals /^ of an ^-adic subdivision are 
thus coded by the prefix order in the complete |^|-ary planar tree. In particular, 
for any w e W and for any cutset K, of the complete |^|-ary tree. 



U^. 



veK. 



(this union is a disjoint one; see Section [2?T] for a definition of a cutset). 
We will use the following convention for ^-adic subdivisions: we require the 
intervals I^ to be open on the left side and closed on the right side, except the 
ones of the form Jon that are compact. Obviously, if /i is any probability measure 
on 7^ = A^, there exists a unique ^-adic subdivision of / such that |/^| = niwlZ) 
for any w G W. 



13 



Given an ^-adic subdivision of /, we extend the notation J^ to right-infinite 
words by 



G 7^, Ir = 


- n 

w prefix of ' 


r 



Definition 3.15 A family {Iv)v£V of subintervals of I indexed by a totally ordered 
set V is said to define an ordered topological partition of I when 

(a) for any v,v' E V, v j^ v' =^ int(/„) fl int(/^/) = 0, 
(Hi) for any v,v' G V , 

V < v' =^ Vx G Iv, Vx' G It,/, X < x' 

where cl(/t,) and int(/„) stand respectively for the closure and the interior of ly. 
We will denote 

vev 

We will use the following fact: if / = XIvgv t ^v = J2vev t Jv are two ordered 
topological partitions of / indexed by the same denumerable ordered set V, then 
ly = Jy for any v E V as soon as \Iy\ = \Jy\ for any v eV. 

3.2.3 Definition of the mapping T 

Let {Un)n>o be a SVLMC, defined by its probabilized context tree (T, {qc)c£c) and 
a stationarjo probability measure vr on C. We first associate with vr the unique 
v4-adic subdivision {Iw)w£W of ^^ defined by: 

\fw G W, |/^| =7r(w), 

(recall that ii w = ai . . . a^, W is the reversed word a^ . . . cti and that 7i(w) 
denotes 7r(£uJ)). 

We consider now three ordered topological partitions of I. 

• The coding partition 

It consists in the family {Ia)a€A'- 

aeA 



Note that this construction can be made replacing it by any probabihty measure on C. 
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• The vertical partition 

The denumerable set of finite and infinite contexts C is a cutset of the A-ary tree. 

The family {Ic)cec thus defines the so-called vertical ordered topological partition 



Et^' 



cec 



• The horizontal partition 

AC is the set of leaves of the context tree AT = {aw, a E A, w & T}. As 

before, the family {Iac)aceAC defines the so-called horizontal ordered topological 

partition 



E t/. 



aceAC 



Definition 3.16 The mapping T : I ^ I is the unique left continuous function 

such that: 

(i) the restriction of T to any lac is affine and increasing; 

(a) for any ac G AC, T{Iac) = h- 

The function T is always increasing on Iq and on Ii. When qda) 7^ 0, the slope 
of T on an interval lac is l/qda)- Indeed, with Formula ([7]), one has 

\Iac\ = vr(ca) = qc{a)7i{c) = |/c|gc(«)- 

When qda) = and \Ic\ 7^ 0, the interval lac is empty so that T is discontinuous 
at Xc = 7r({s G £, s < c}) (< denotes here the alphabetical order on 7?.). Note 
that \Ic\ = implies \Iac\ = 0. In particular, if one assumes that all the probability 
measures qc, c G C, are nontrivial {i.e. as soon as they satisfy (?c(0)q'c(l) 7^ 0), 
then T is continuous on Iq and Ji. Furthermore, T{Iq) = cl(T(Ji)) = / and for 
any c E C, T^^Ic = Iqc U he (see Figure H]). 

Example: the four flower bamboo 

The four flower bamboo is the VLMC defined by the finite probabilized context 
tree of Figure [51 There exists a unique stationary measure vr under conditions 
which are detailed later, in Example 15.401 We represent on Figure [5] the mapping 
T built with this vr, together with the respective subdivisions of x-axis and y- 
axis by the four /^ and the eight lac- The x-axis is divided by both coding and 
horizontal partitions; the y-axis is divided by both coding and vertical partitions. 
This figure has been drawn with the following data on the four fiower bamboo: 
goo(O) = 0.4, goio(O) = 0.6, goii(O) = 0.8 and gi(0) = 0.3. 
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Figure 4: action of T on horizontal and vertical partitions. On this figure, c is 
any context and the alphabet is {0, 1}. 
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Figure 5: on the left, the 4 fiower bamboo context tree. On the right, its mapping 
together with the coding, the vertical and the horizontal partitions of [0, 1]. 

3.2.4 Properties of the mapping T 

The following key lemma explains the action of the mapping T on the intervals of 
the ^-adic subdivision {Iw)w&w- More precisely, it extends the relation T{Iac) = 
Ic, for any ac G AC, to any finite word. 
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Lemma 3.17 For any finite word w G W and any letter a G A, T{Iaw) = Iw 

Proof. Assume first that w ^T. Let then c G C be the unique context such that 
c is a prefix of w. Because of the prefix order structure of the v4-adic subdivision 
{Iv)vi one has the first ordered topological partition 

Ic= Y. t/. (9) 

vGW, |v| — |w| 
c prefix of V 

(the set of indices is a cutset in the tree of c descendants). On the other hand, 
the same topological partition applied to the finite word aw leads to 

J-cxc / ^ I J-av 

vSW, |v| = |w| 
c prefix of V 

Taking the image by T, one gets the second ordered topological partition 

h= Yl tT{Iav). (10) 

veW, |v| — |w| 
c prefix of v 

Now, if c is a prefix of a finite word v, lav ^ lac and the restriction of T to lac is 
affine. By Thales theorem, it comes 

I T I 

\T(I )\ = \T I - 



Since tt is a stationary measure for the VLMC, 

\Iac\ = 7r(ca) = qc{a)7T(c) = \Ic\qc{a)- 

Furthermore, one has 7i(ya) = qc{a)'7i(v). Finally, \T{Iav)\ = \Iv\- Relations Qj 
and fITOl) are two ordered denumerable topological partitions, the components 
with the same indices being of the same length: the partitions are necessarily the 
same. In particular, because w belongs to the set of indices, this implies that 

Assume now that w & T. Since the set of contexts having w as a prefix is a cutset 
of the tree of the descendants of w, one has the disjoint union 



-^aw I) ^ac- 



c6C 
prefix of c 
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Taking the image by T leads to 

cGC,w prefix of c 

and the proof is complete. ■ 

Remark 3.18 The same proof shows in fact that ifw is any finite word, T~^Iw = 
Iqw U hw (disjoint union). 

Lemma 3.19 For any a ^ A, for any context c ^ C, for any Borelian set B C I^, 

|/^nr-^5| = \B\q^{a). 

Proof. It is sufficient to show the lemma when B is an interval. The restriction 
of T to lac is affine and T~^Ic = Iqc U he- The result is thus due to Thales 
Theorem. ■ 

Corollary 3.20 IfT is the mapping associated with a SVLMC, Lebesgue measure 
is invariant by T, i.e. \T^^B\ = \B\ for any Borelian subset of I . 

Proof. Since B = ij^g^j Bnic (disjoint union), it suffices to prove that \T^^B\ = 
\B\ for any Borelian subset of Ic where c is any context. If i? C J^, because of 
Lemma 13.191 

\T-'B\ = |/o n T-'B\ + \h n T-'B\ = |S|(g,(0) + g,(l)) = \B\. 



3.2.5 SVLMC as dynamical source 

We now consider the stationary probabilistic dynamical source {{Ia)aeAi PiT, |.|) 
built from the SVLMC. It provides the ^-valued random process {Yn)n&i defined 
by 

where ^ is a uniformly distributed /-valued random variable and p the coding 
function. Since Lebesgue measure is T-invariant, all random variables V„ have 
the same law, namely P(V„ = 0) = |/o| = 7r(0). 

Definition 3.21 Two A-valued random processes (V^n)neN (i''^d {Wnjnem 0,1"^ called 
symmetrically distributed whenever for any V G N and for any finite word 
w G A^+\ P{WoWi ...Wn = w) = P(K)^i ...Vn=W). 
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In other words, (Ki)neN and (Wn)n£N are symmetrically distributed if and only 
if for any N & N, the random words VFqW^i • • • W^ and VnVn^i . . . Vq have the 
same distribution. 

Theorem 3.22 Let {Un)n&N be a SVLMC and vr a stationary probability measure 

on C. Let (X„)„gN be the process of final letters of {Un)neN- Let T : I ^ I be the 

mapping defined in Section l3.2.3[ Then, 

(i) Lebesgue measure is T -invariant. 

(a) If ^ is any uniformly distributed random variable on I, the processes (X„)„gN 

and {p(T^^))n£N are symmetrically distributed. 

Proof, (i) has been already stated and proven in Corollary 13.201 
(a) As before, for any finite word w = ao . . . aN G W, let B^ = nfc=o ^~'^-^«fc ^^ 
the Borelian set of real numbers x such that the right-infinite sequence (p(T"'x))„gH 
has w as a prefix. By definition, i?Q, = /q if a G A. More generally, we 
prove the following claim: for any w G W, -B^ = /^. Indeed, if a G ^ 
and w G W, Baw = -^a H T~^B^; thus, by induction on the length of w, 
Baw = -^a n T^^Iw = law, the last equality being due to Lemma I3.17[ There 
is now no difficulty in finishing the proof: if tf G W is any finite word of length 
A^ + 1, then P{Xo . . . Xj^i = W) = 7i(w) = |/^|. Thus, because of the claim, 
P(Xo . . . Xn = w) = \Biu\ = P(Yo . . .Yn = w). This proves the result. ■ 

4 Examples 

4.1 The infinite comb 

4.1.1 Stationary probability measures 

Consider the probabilized context tree given on the left side of Figure El In this 
case, there is one infinite leaf 0°° and countably many finite leaves 0"1, ra G N. 
The data of a corresponding VLMC consists thus in probability measures on 
^ = {0,1}: 

go°° and qo^i, n eN. 

Suppose that tt is a stationary measure on C We first compute 7t{w) (notation 
dSJ) as a function of 7r(l) when w is any context or any internal node. Because of 
Formula ([7]), 7r(10) = 7r(l)gi(0) and an immediate induction shows that, for any 
n> 0, 

7r(10") = 7l{l)Cn, (11) 
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h 



qo- 



Q'O"! 




Figure 6: infinite comb probabilized context tree (on the left) and tlie associated 
dynamical system (on tlie riglit). 



wliere Cq = 1 and, for any n > 1, 



n-l 



Cn = n^On(O). 



(12) 



fc=0 



The stationary probabihty of a reversed context is thus necessarily given by For- 
mula ( TTT1) . Now, if 0" is any internal node of the context tree, we need going 
down along the branch in T to reach the contexts; using then the disjoint union 
7r(0""'"^) = 7r(0") — 7r(10"), by induction, it comes for any n > 0, 



n-l 



7r(0") = l-7r(l)^Cfc. 



(13) 



fc=0 



The stationary probability of a reversed internal node of the context tree is thus 
necessarily given by Formula ( 1T3l) . 

It remains to compute vr(l). The denumerable partition of the whole probability 
space given by all cylinders based on leaves in the context tree (Formula ([3])) 
implies 1 - 7r(0°°) = 7r(l) + 7r(10) + 7r(100) + . . . , z.e. 



l-7r(0°^) = 5^7r(l)c,. 



(14) 



n>0 
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This leads to the following statement that covers all cases of existence, unicity 
and nontriviality for a stationary probability measure for the infinite comb. In 
the generic case (named irreducible case hereunder), we give a necessary and suf- 
ficient condition on the data for the existence of a stationary probability measure; 
moreover, when a stationary probability exists, it is unique. The reducible case 
is much more singular and gives rise to nonunicity. 

Proposition 4.23 (Stationary probability measures for an infinite comb) 

Let (f/„,)„>o be a VLMC defined by a probabilized infinite comb. 

(i) Irreducible case 
Assume that go^^lO) 7^ 1- 

(i.a) Existence 

The Markov process {Un)n>o admits a stationary probability measure on C if and 

only if the numerical series ^ c„ defined by [W\) converges. 

(i.b) Unicity 

Assume that the series ^ c„ converges and denote 



S{l) = Y,On. (15) 



n>0 



Then, the stationary probability measure -k on C is unique; it is characterized by 

and Formulae / flT]) . ( fl^) and ^. 

Furthermore, n is trivial if and only if qi{0) = 0, in which case it is defined by 

7r(l~) = 1. 

(a) Reducible case 
Assume that q'o°°(0) = 1- 

(a. a) If the series Yl'^n diverges, then the trivial probability measure ix on C 
defined by 7r(0°°) = 1 is the unique stationary probability. 

(ii.b) If the series ^ c„ converges, then there is a one parameter family of sta- 
tionary probability measures on C More precisely, for any a G [0, 1], there exists 
a unique stationary probability measure iia on C such that vra(0°°) = a. The 
probability iia is characterized by vr(j(l) = -|^ and Formulae / flT]) . I[T^) and ^. 
Furthermore, iia is non trivial except in the two following cases: 

• a = 1, in which case tti is defined by tti{0°°) = 1; 

• a = and qi{0) = 0, in which case ttq is defined by 7ro(l°°) = 1. 
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Proof, (i) Assume that q'o°°(0) 7^ 1 and that vr is a stationary probabihty mea- 
sure. By definition of probabihty transitions, 7r(0°°) = 7r(0°°)go°°(0) so that 7r(0°°) 
necessarily vanishes. Thus, thanks to (JH]), 7r(l) 7^ 0, the series ^c„ converges 
and Formula f lT6|) is valid. Moreover, when w is any context or any internal node 
of the context tree, 7i{w) is necessarily given by Formulae flT6l) . flTTj) and flT3|l . 
This shows that for any finite word w, tt{w) is determined by Formula ([H]). Since 
the cylinders Cw, w eW span the a- algebra on C, there is at most one stationary 
probability measure. This proves the only if part of (i.a), the unicity and the 
characterization claimed in (i.b). 

Reciprocally, when the series converges. Formulae ( 1T6|) . (ITT]) . (TT3|) and ([8]) define 
a probability measure on the semiring spanned by cylinders, which extends to a 
stationary probability measure on the whole a-algebra on C (see Billingsley J3| 
for a general treatment on semirings, a-algebra, definition and characterization 
of probability measures). This proves the if part of (i.a). Finally, the definition 
of Cn directly implies that S'(l) = 1 if and only if gi(0) = 0. This proves the 
assertion of (i.b) on the triviality of n. 

(a) Assume that go°°(0) = 1. Formula flT^ is always valid so that the divergence 
of the series ^c„ forces 7r(l) to vanish and, consequently, any stationary measure 
IT to be the trivial one defined by 7r(0°°) = 1. 

Besides, with the assumption qo°°{0) = 1, one immediately sees that this trivial 
probability is stationary, proving (ii.a). 

To prove (ii. b), assume furthermore that the series Yl Cn converges and let a G 
[0, 1]. As before, any stationary probability measure vr is completely determined 
by 7r(l). Moreover, the probability measure defined by iiai^) = ^7^5 Formu- 
lae ( ITT]) . ( IT3I) and (jH]) and standardly extended to the whole a-algebra on C is 
clearly stationary. Because of Formula ( IT4|) . it satisfies 

7r,(0°°) = l-7r,(l)5(l) = a. 

This proves the assertion on the one parameter family. Finally, tTq is trivial only if 
7ra(l)e{0,l}. Ifa = l then 7ra(l) = thus tti is the trivial probability that only 
charges 0°°. If a = then 7ra(l) = 1/S{1) is nonzero and it equals 1 if and only 
if 5(1) = 1, i.e. if and only if gi(0) = 0, in which case ttq is the trivial probability 
that only charges l'^ 
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Remark 4.24 This proposition completes previous results which give sufficient 
conditions for the existence of a stationary measure for an infinite comb. For 
instance, in Galves and Locherbach fl2] . the intervening condition is 

^gon(l) = +00, 

fc>0 
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which is equivalent with our notations to c„ — t- 0. Note that if^Cn is divergent, 
then the only possible stationary distribution is the trivial Dirac measure ^qoo . 

4.1.2 The associated dynamical system 

The vertical partition is made of the intervals Jo"! for n > 0. The horizontal 
partition consists in the intervals /oo"i and /io"i, for n > 0, together with two in- 
tervals coming from the infinite context, namely Iqoo and /io°°- In the irrreducible 
case, 7r(0°°) = and these two last intervals become two accumulation points of 
the partition: and vr(0). The following lemma is classical and helps understand 
the behaviour of the mapping T at these accumulation points. 

Lemma 4.25 Let / : [a, 6] — t- M be continuous on [a, h], differentiable on ]a, b[\D 
where D is a countable set. The fonction f admits a right derivative at a and 

/;(a) = Jim fix) 

xi^D 

as soon as this limit exists. 

Corollary 4.26 // (q'o"i(0))ngN converges, then T is differentiable at andTiiff) 
(with a possibly infinite derivative) and 

T;{0) = lim — \-, T;(7r(0)) = lim ^ 



-oo 



gO"l(0)' '■ n^+oo go„;^(l) 



When (go"i(0))neN converges to 1, T^'(O) = 1. In this case, is an indifferent 
fixed point and T/(7r(0)) = +00. The mapping T is a slight modification of the 
so-called Wang map (Wang [27|). The statistical properties of the Wang map are 
quite well understood (Lambert et al. (l7|). The corresponding dynamical source 
is said intermittent. 

4.1.3 Dirichlet series. 

For a stationary infinite comb, the Dirichlet series is defined on a suitable vertical 
open strip of C as 

A(.) = J2 <^r- 

wew 

In the whole section we suppose that ^ c„ is convergent. Indeed, if it is divergent 
then the only stationary measure is the Dirac measure 60^0 and A(s) is never 
defined. 
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The computation of the Dirichlet series is tractable because of the following for- 
mula: for any finite words w, w' G W, 

tt{wIw')tv{1)=tx{wI)tx{Iw'). (17) 

This formula, which comes directly from Formula ([8]), is true because of the very 
particular form of the contexts in the infinite comb. It is the expression of its 
renewal property. The computation of the Dirichlet series is made in two steps. 

Step 1. A finite word either does not contain any 1 or is of the form wlO*^, 
w eW,n>Q. Thus, 

A(.) = 5^7rrr + EE^(^io"^ 

n>0 n>0 wgW 

Because of Formulae ^ and ([I6]), 7r(w;10") = 5(l)7r(u;l)7r(10"). Let us denote 

Ai(.) = Y. niwiy. 

With this notation and Formulae flTTj) and flT3l) . 



where Rn stands for the rest 



5(] 

^ ' n>0 ra>0 



i?„ = ^Cfe. (18) 

k>n 

Step 2. It consists in the computation of Ai. A finite word having 1 as last 
letter either can be written 0^1, n > or is of the form M7lO"l, w G W, n > 0. 
Thus it comes, 

n>0 n>0 toGW 

By Formulae ( 1T7|) and (TTT]) . 7r(wl0"l) = 7r(wl)c„go"i(l) = 7r(u7l)(c„ — c„+i), so 
that 



Ai(s) = — - E < + Ms) E(C" - c«+i: 

^ ^ n>0 n>0 

and 

KAs) 



s 



Putting results of both steps together, we obtain the following proposition. 
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Proposition 4.27 With notations ^M}, (E^j O'^d / fi^) . the Dirichlet series of a 
source obtained from a stationary infinite comb is 



Ms) 



5(1) 



JL^n + Y 



[l^n>o'^n) 



n>0 



l^n>0\^ri Cn+1) 



(E > <) 

Remark 4.28 The analytic function t-^ — "j° "^ — ^ is always singular for s = 1 
because its denominator vanishes while its numerator is a convergent series. 



Examples. (1) Suppose that < a < 1 and that go"i(0) = a for any n > 0. 
Then Cn = a^, Rn = yz~ ^^^^ 'S'(l) = j^. For such a source, the Dirichlet series 
is 

A{s) 



1 



1 - [a'' + (1 - aY] ■ 

In this case, the source is memoryless: all letters are drawn independently with 
the same distribution. The Dirichlet series of such sources have been extensively 
studied in Flajolet et al. J8| in the realm of asymptotics of average parameters of 
a trie. 

(2) Extension of Example 1: take a, b g]0, 1 [ and consider the probabilized infinite 
comb defined by 

. X _ f a if n is even. 

After computation, the Dirichlet series of the corresponding source under the 
stationary distribution turns out to have the explicit form 



A(.) 



1 



[aby 



1 + 



a + ab 
1 + a 



ab 



[l + a- 



s\2 



l + a J l-{aby -{l-aY -a'{l-by 



The configuration of poles of A depends on arithmetic properties (approximation 
by rationals) of the logarithms of ab, 1 — a and a(l — b). The poles of such a series 
are the same as in the case of a memoryless source with an alphabet of three 
letters, see Flajolet et al. |8|. This could be extended to a family of examples. 

(3) Let a > 2. We take data go"i(0), n > in such a way that cq = 1 and, for 
any n > 1, 



On = C{n,a) :-- 
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where ( is the Riemann function. Since c„ G 0{n^ °) when n tends to infinity, 
there exists a unique stationary probabihty measure vr on £. One obtains 



S{1] 



C(«) 



and, for any n > 1, 



Rn = ^1 . C{n, a-1) - {n- l)C(n, a). 

In particular, Rn G (9(n^~") when n tends to infinity. The final formula for the 
Dirichlet series of this source is 



A(.) 



^(1)^ 






.ra>0 



CM 



(4) One case of interest is when the associated dynamical system has an indifferent 
fixed point (see Section [4. 1.2p . for example when 



goM(o) = 1 



n + 2 



with 1 < a < 2. In this situation, c„ = (1 + n) " and 

CM^ 1 



A(s) = ^c(^,«r+ 



n>l 



C(a) 



1-E 



n>l 



n"- 



n + 1 



4.1.4 Generating function for the exact distribution of word occur- 
rences in a sequence generated by a comb 

The behaviour of the entrance time into cylinders is a natural question arising in 
dynamical systems. There exists a large literature on the asymptotic properties of 
entrance times into cylinders for various kind of systems, symbolic or geometric; 
see Abadi and Galves [l| for an extensive review on the subject. Most of the 
results deal with an exponential approximation of the distribution of the first 
entrance time into a small cylinder, sometimes with error terms. The most up- 
to-date result on this framework is Abadi and Saussol |2|, not published yet, in 
which the hypothesis are made only in terms of the mixing type of the source 
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(so-called a- mixing) . We are here interested in exact distribution results instead 
of asymptotic behaviours. 

Several studies in probabilities on words are based on generating functions. For 
example one may cite Regnier 20|, Reinert et al. 2l|, Stefanov and Pakes 25 . 
For i.i.d. sequences, Blom and Thorburn j^ give the generating function of the 



first occurrence of a word, based on a recurrence relation on the probabilities. This 
result is extended to Markovian sequences by Robin and Daudin 2J]. Nonethe- 



less, other approaches are considered: one of the more general techniques is the 
so-called Markov chain embedding method introduced by Fu |9[ and further de- 
velopped by Fu and Koutras [lO|, Koutras [16|. A martingale approach (see 
Gerber and Li 13|, Li 18|, Williams 28|) is an alternative to the Markov chain 
embedding method. These two approaches are compared in Pozdnyakov et al. 

HI 



We establish results on the exact distribution of word occurrences in a random 
sequence generated by a comb (or a bamboo in Section [4.2.41) . More precisely, 
we make explicit the generating function of the random variable giving the r^^ 
occurrence of a /c-length word, for any word w such that w is not an internal node 
ofT. 

Let us consider the process X = {Xn)n>o of final letters of (t/„)n>o, in the partic- 
ular case of a SVLMC defined by an infinite comb. Let w = Wi . . .Wk he a. word 
of length k > 1. We say that w occurs at position n > k in the sequence X if the 
word w ends at position n: 



{w at n} = {Xn^k+i ■ ■ ■ Xn = w} = {Un e Cw}. 



<r) 



dr) 



Let us denote by Tw the position of the r occurrence of w in X and $«, its 
generating function: 

n>0 

The following notation is used in the sequel: for any finite word u G W, for any 
finite context c & C and for any n > 0, 



qi-\u) 



P (^n-|«|+l ■ ■ -Xn = -ulX-dd-l) . . .Xq = c) . 



These quantities may be computed in terms of the data q^. Proposition 14.291 
generalizes results of Robin and Daudin 24 1. 



Proposition 4.29 For a SVLMC defined by an infinite comb, with the above 
notations, for a word w such that W is non internal node, the generating function 
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of its first occurrence is given, for \x\ < 1, by 



$2)( 



x^n{w) 



(1 -x)S^{x) 
and the generating function of its r*^ occurrence is given, for \x\ < 1, by 

. \ r-l 



b^[X) 

where 



j=k 

k-1 



i=i 



Remark 4.30 The term C^(x) is a generalization of the probabihzed autocor- 
relation polynomial defined in Jacquet and Szpankowski [15| in the particular 
case when the (X„)„>o are independent and identically distributed. For a word 
w = wi . . .Wk this polynomial is equal to 



fc-i ^ 

^ 7r{wi . . . Wk-j) 

where Cj^^ = 1 if the fc — j-length suffix of w is equal to its fc — j'-length prefix, and 
is equal to zero otherwise. When the (X„)„,>o are independent and identically 
distributed, we have 



fe-l fc-l / X 

iriw) 



7 . '^{w,+i...w,,=wi...Wk-,}Q)^ci (w) i^k-j+1 ■■■Wk)x^ - 2_^ Cj^ 

that is 



-x^ 



n[wi...Wk-j 



Cnj{x) = n{w)c^^{x). 

Proof. We first deal with w = lO'^"^, that is the only word w of length k such 
that w & C. For the sake of shortness, we will denote by pn the probability that 
Tyj = n. From the obvious decomposition 

{w at n} = {T^^^ = n} U {TJ;^' < n and w at n}, (disjoint union) 
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it comes by stationarity of vr 



n-l 



7t{w) =Pn + J2P^P {Xn-k+1 • • • ^n = w\Ti^^ = z) . 
z=k 

Due to the renewal property of the comb, the conditional probability can be 
rewritten 

P {Xn-k+i ...Xn = w\X^_k+i ...X^ = w) a z <n- k 
ii z > n — k ^ 

the second equality is due to the lack of possible auto- recovering in w. Conse- 
quently, we have 

n~k 

7r(tf) =Pn + ^Pzqi~'\w). 

z=k 

Hence, for x < 1, it comes 

u / \ +00 +00 n~k 

-rZ = }^PnX +2^x 2^Pzqin '{w), 

n=k n=k z=k 

SO that 



$«(a;) l + ^a;^g4,., ,, 



which leads to 



1 — X 

j=k 



$l!)fx)- "'"(") 



w] 



■ w 



(1 -x)S^{x)' 

Note that when w = 10^~\ Cyo{x) = 1. 

Proceeding in the same way for the r^^ occurrence, from the decomposition 

{w at n} = {Ti^'> =n}U {T^^^ = n} U . . . U {T^^^ = n} U {Xi^^ < n and m; at n}, 

and denoting p{n,i) = P(T^ = n), the following recursive equation holds: 

n-l 

Tr{w) = pn + p{n, 2) + . . . + p{n, r) + N^ P (T^' = z and w at n) . 

z=k 

Again, by splitting the last term into two terms and using the non-overlapping 
structure of w, one gets 

n—k 

7r{w) =pn + p{n,2) + ... +p{n,r) + y^Pzg^ (w). 

z=k 
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From this recursive equation, proceeding exactly in the same way, one gets for 
the generating function, for x < 1, 



*<"(.) = *i!>w(i-^) 



r-1 



Let us now consider the case of words w such that W ^ T, that is the words w 
such that Wj = 1 for at least one integer j G {2,...,/c}. We denote by i the last 
position of a 1 in w, that is pref (u;) = 0*''~''l. Once again we have 



71-1 



z=k 

When z < n — k, due to the renewal property, the conditional probability can be 
rewritten as 

Wi Wk-n+z Wk 



z n 
-\ h- 



Wi Wn-z+l Wk 



When z > n — k (see figure above), 

P {w at n\Tl = z) = l{^„_,+i...^,=^i...^,_„+,}g|^/(^^)(wfc-n+^+i • --Wk), 

this equality holding ii n — k + i ^ z. But when z = k-i 

n — k + i, because the first occurrence of w is at z, _ ijT^^^ 

necessarily Wk = 1 and hence i = k, and z = n which ■ i 

contradicts z < n. Consequently for z = n — k + i we n-k+i n 

Finally one gets 

n—k 

z=l 
n-1 

+ / _, Pz^{wn-z+i—Wk=wi...Wk-n+z}Qprei (ui) v'^*:-"+2+l ■ • ■ "^k), 

z=n—k+l 
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and hence 



X^Tl{w)/{l — X) 



fc-1 



(J) 



j=k j=l 

Proceeding exactly in the same way by induction on r, we get the expression of 
Theorem 14.291 for the r-th occurrence. ■ 

Remark 4.31 The case of internal nodes w = 0^ is more intricate, due to the 
absence of any symbol 1 allowing a renewal argument. Nevertheless, for the forth- 
coming applications, we will not need the explicit expression of the generating 
function of such words occurrences. 

4.2 The Bamboo blossom 

4.2.1 Stationary probability measures 

Consider the probabilized context tree given by the left side of Figure [3 




qou 



90100 



9(01) 



9(01) 



901011 



9(oi)"+ii 




Figure 7: bamboo blossom probabilized context tree (on the left) and the associ- 
ated dynamical system (on the right). 

The data of a corresponding VLMC consist in probability measures on A indexed 
by the two families of finite contexts 

(9(oi)"i)n>o and (9(oi)"oo)n>o 
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together with a probabihty measure on the infinite context g(oi)°°. 
As before, assuming that vr is a stationary probabihty measure on £, we compute 
the probabihties of any 7r{w), w being an internal node or w being a context, 
as functions of the data and of both 7r(l) and 7r(00). Determination of sta- 
tionary probabihties of cyhnders based on both contexts 1 and 00 then leads to 
assumptions that guarantee existence and unicity of such a stationary probability 
measure. 

Computation of 7r{w),w context 

Two families of cylinders, namely £1(10)" and £00(10)", correspond to con- 
texts. For any n > 0, 7r(l(10)"+i) = 7r(l(10)")g(oi)M(l)gi(0) and 7r(00(10)"+^) = 
7r(00(10)")g(oi)"oo(l)?i(0). A straightforward induction implies thus that for any 
n> 0, 

' 7r(l(10)") = 7r(l)c„(l) 

(19) 
7r(00(10)") = 7r(00)c„(00) 

where co(l) = co(OO) = 1 and 

n-l 

c„(l)=gi(0)"n^(oi)H(l) 

fc=0 
n-l 



Cn(00) = gi(0)"n^(oi)^oo(l) 



fc=0 

for any n > 1. 

Computation of 7i{w),W internal node 

Two families of cylinders, £0(10)" and £(10)", correspond to internal nodes. By 
disjoint union of events, they are related by 

7r(0(10)") = 7r((10)") - 7r(l(10)") 
7r((10)"+i) = 7r(0(10)") - 7r(00(10)") 

for any n > 0. By induction, this leads to: \/n > 0, 

7r(0(10)") = 1 - 7T{l)Snil) - 7r(00)^„_i(00) 



7r((10)") = 1 - 7r(l)S„_i(l) - 7r(00)5„_i(00) 
where «S'_i(l) = S'_i(00) = and, for any n> 0, 

( Sn{l) = ELoCfc(l) 

. ^n(oo) = ELoc.(oo). 



(20) 
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These formulae give, by quotients, the conditional probabilities on internal nodes 
defined by (jS]) and appearing in Formula ([H])- 

Computation of 7r(l) and of 7r(00) 

The context tree defines a partition of the set £ of left-infinite sequences (see (j3])). 

In the case of bamboo blossom, this partition implies 

l-7r((10n = ^7r(l(10r) + ^vr(00(10r) (21) 

n>0 n>0 

= ^7r(l)c„(l) + ^7r(00)c„(00). (22) 

n>0 n>0 

We denote 

5(l) = E„>0Cn(l) 

^(00) = En>0Cn(00)G[l,+Oo]. 

Note that the series 5'(1) always converges. Indeed, the convergence is obvious if 
gi(0) 7^ 1; otherwise, qi{0) = 1 and qi{l) = 0, so that any c„(l), n > 1 vanishes 
and S'(l) = 1. In the same way, the series S'(OO) is finite as soon as qi{0) ^ 1. 

Proposition 4.32 (Stationary measure on a bamboo blossom) 

Let {Un)n>o be a VLMC defined by a probabilized bamboo blossom context tree. 

(i) Assume that q'i(O) 7^ 1, then the Markov process {Un)n>o admits a stationary 
probability measure on C which is unique if and only z/S'(l) — S'(00)(l+gi(0)) 7^ 0. 

(a) Assume that gi(0) = 1. 

(a. a) If S{00) = 00, then (f/n)n>o admits vr = |5(io)oo + |5(io)ooi as unique sta- 
tionary probability measure on C. 

(ii.b) If S (00) < 00, then (f/„)n>o admits a one parameter family of stationary 
probability measures on C. 

Proof, (i) Assume that qi{0) ^ 1 and that vr is a stationary probability measure. 
Applying ([7]) gives 

7r((10)°°) = gi(0)g(oi)^(l)7r((10)°°) (23) 

and consequently 7r((10)°°) = 0. Therefore, Equation f l2T]) becomes S'(l)7r(l) + 
S'(00)7r(00) = 1. We get another linear equation on 7r(l) and 7r(00) by disjoint 



33 



union of events: 7r(0) = l-7r(l) = 7r(10) + 7r(00) = 7r(l)gi(0) +7r(00). Thus 7r(l) 
and 7r(00) are solutions of the hnear system 

5(l)7r(l) + 5(00)7r(00) = 1 

(24) 
[l + gi(0)]7r(l) + 7r(00) = l. 

This system has a unique solution if and only if the determinantal assumption 

5(l)-S(00)[l + gi(0)]^0 

is fulfilled, which is a very light assumption (if this determinant happens to be 
zero, it suffices to modify one value of some g^, u context for the assumption to 
be satisfied). Otherwise, when the determinant vanishes. System (I24p is reduced 
to its second equation, so that it admits a one parameter family of solutions. 
Indeed, 

1 < ^(1) < 1 + gi(0)(l - gi(0)) + Yl '?i(o)"(i - ^i(o)) = 1 + '?i(o) 

n>2 

and ^(00) > 1, so that S{1) - S(00)(1 + gi(0)) = implies that S{1) = 1 + gi(0) 
and 5(00) = 1. In any case. System (^^ has at least one solution, which ensures 
the existence of a stationary probability measure with Formulae fl20|) . f lT9|) and ([8]) 
by a standard argumentation. Assertions on unicity are straightforward. 

(ii) Assume that gi(0) = 1. This implies ^1(1) = and consequently S'(l) = 1. 
Thus, 7r(l) and 7r(00) are solutions of 

7r(l) + 5(00)7r(00) = 1 - 7r((10)°°) 

(25) 
27r(l) + 7r(00) = 1. 

so that, since S'(OO) > 1, the determinantal condition S{1) — S'(00)(1 + gi(0)) 7^ 
is always fulfilled. 

(zz.a) When S{00) = 00, 7r(00) = 0, 7r(l) = | and 7r((10)~) = |. This defines 
uniquely a stationary probability measure vr. Because of fl2^ . q'(oi)°o(l) = 1 so 
that 7r((10)°°l) = 7r((10)~)) = |. This shows that vr = |5(io)oc + i(5(io)ooi. 

(ti.b) When 5(00) < 00, if we fix the value a = 7r((10)°°), System ([25]) has 
a unique solution that determines in a unique way the stationary probability 
measure tTq. ■ 
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4.2.2 The associated dynamical system 

The vertical partition is made of the intervals /(oi)"oo and I(oi)"i for n > 0. The 
horizontal partition consists in the intervals /o(oi)"oO) h{oi)"oo, -^o(oi)"i and /i{oi)"i 
for n > 0, together with the two intervals coming from the infinite context, namely 
/o(oi)oo and /i(oi)oo. If we make an hypothesis to ensure 7r((10)°°) = 0, then these 
two last intervals become two accumulation points of the horizontal partition, oq 
and ai. The respective positions of the intervals and the two accumulation points 
are given by the alphabetical order 

0(01)"-^00 < 0(01)"00 < 0(01)°° < 0(01)"1 < 0(01)"-^! 

1(01)"~^00 < 1(01)"00 < 1(01)°° < 1(01)"1 < 1(01)"-^ 

Lemma 4.33 //(q'(oi)"oo(0))neN and (q'(oi)'n(0))neN converge, then T is right and 
left dijjerentiable in Uq and Oi - with possibly infinite derivatives - and 



T;(ao) = lim , 

n^oo g(oi)"00(0) 


T;(ao) = 


= lim 


r^'(ai) = lim , 


TXa,) = 


= lim . 

n^oo g(oi)ni(l) 


Proof. We use Lemma |4251 





4.2.3 Dirichlet series 

As for the infinite comb, the Dirichlet series of a source generated by a stationary 
bamboo blossom can be explicitly computed as a function of the SVLMC data. 
For simplicity, we assume that the generic Condition (i) of Proposition 14.321 is 
satisfied. An internal node is of the form (01)" or (01)"0 while a context writes 
(01)^00 or (01)"1. Therefore, by disjoint union, 

A{s)=A{s)+ Y, vr(M;00(10)")^ + ^ 7r(wl(10)")'' 

n>0,tueW n>0,w€W 

where 

/l(.)=5]vr((10)'^)^ + 5^7r(0(10)")^ 

ra>0 n>0 

is explicitly given by Formulae ( 120|) and (12^ . Because of the renewal property 
of the bamboo blossom. Formula (^^ leads by two straightforward inductions to 
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7r(w00(10)") = 7r(w00)c„(00) and 7r(wl(10)") = 7r(M;l)c„(l) for any n > 0. This 
implies that 

A(.) = Ais) + Aoo(.) Y. <(00) + ^i(^) E <(1) 

n>0 n>0 

where 

Aoo(s) = J2 Tr{wOOy and Ai(s) = ^ 7r{wiy. 
wew wew 

It remains to compute both Dirichlet series Aqo and Ai, which can be done by a 
similar procedure. 

By disjoint union of finite words, 

Aoo(s) = Aoo(s) + Yl 7r(w;00(10)"00)^ + Yl 7r(w;l(10)"00)^ (26) 

n>0,iue>V n>0,weW 

where 

Aoois) = Y AiWOOy + Y ^(0(10)"00)^ 

n>0 n>0 

and 

Ai(s) = A(s)+ J] 7r(w;00(10)"l)^ + ^ 7r(w;l(10)"l)^ (27) 

n>0,weW n>0,weW 

with 

n>0 n>0 

Computation of Ai and Aqo 

By disjoint union and Formula ([7]), 

7r((10)"+^00) = 7r(0(10)"00) - vr(00(10)")g(oi)noo(0)goo(0), n > 

and 

7r(0(10)"00) = 7r((10)"00) - 7r(l(10)")g(oi)ni(0)goo(0), n > 1 

where 7r(00(10)'") and 7r(l(10)") are already computed probabilities of contexts 
(Formula ([ED)- Since 7r(000) = 7r(00)goo(0), one gets recursively 7r((10)"00) and 
7r(0(10)"'00) from these two relations as functions of the data. This computes Aqo- 
A very similar argument leads to an explicit form of Ai. 
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Ultimate computation of Ai and Aqo 

Start with (1261) and fl27|) . As above, for any n > 0, by induction and with 
Formula ([7]), 

7r(w;00(10)"00) = 7r(u;00)c„(00)g(oi)noo(0)goo(0). 

In the same way, but only when n > 1, 

7r(i/;l(10)"00) = 7r(«;l)c„(l)g(oi)M(0)goo(0). 

Similar computations lead to similar formulae for 7r(u700(10)'"l) and 7r(tf;l(10)"'l), 
for any n > 0. So, ([26]) and (EZD lead to 



Aoo(s) = Aoo{s) + Aioo(s) + Aoo(s)5oo(s) + Ai{s)Bi{s) (28) 

where Boo{s) and -Bi(s) are explicit functions of the data and where 

Aioo(s) = J2 ^(^100)- 

toew 

As above, after disjoint union of words, splitting by Formula ([7]) and double 
induction, one gets 

Aioo(s) = Aiooi-s) + Aoo(s)Coo(s) + Ai{s)Ci{s) 

where Aioo(s), Coo(s) and Ci{s) are explicit series, functions of the data. Replac- 
ing Aioo by this value in Formula f l28|) leads to a first linear equation between 
Aoo(s) and Ai(s). A second linear equation between them is obtained from f l27|) 
by similar arguments. Solving the system one gets with both linear equations 
gives an explicit form of Aoo(s) and Ai(s) as functions of the data, completing 
the expected computation. 

4.2.4 Generating function for the exact distribution of word occur- 
rences in a sequence generated by a bamboo blossom 

Let us consider the process X = (X„)„>o of final letters of {Un)n>o in the partic- 
ular case of a SVLMC defined by a bamboo blossom. We only deal with finite 
words w such that W is not an internal node, i.e. uJ is a finite context oiw^T. 
One can see that such a word of length /c > 1 can be written in the form *11(10)^1^ 
or *00(10)^1^, with p G {0, 1} and £ G N, where * stands for any finite word. 
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Proposition 4.34 For a SVLMC defined by a bamboo blossom, with notations 
of Section \4-l-4\ the generating function of the first occurrence of a finite word 
w = Wi . . .Wk is given for \x\ < I by 



" w 



(1 -x)S^{x) 
and the generating function of the r^^ occurrence of w is given by 

. X r-l 



where 

(i) if w is of the form *00(10)^ or *11(01)^0, with £ G N, Sw{x) is defined in 

Proposition \4.2i^ and 

(ii) ifw IS of the form *00(10)^1, ^ G N, 

oo 
j=k 

fe-1 

^w[X) — -L + y ^ -'-{iyj4.i...uifc=uii...uifc_j}Q'i(oi)'^00 V^i+1 ■ ■ ■ ^k) X ■ 

and if w is of the form *11(01)^, ^ G N, 

oo 

j=k 
fc-1 

(-^w[X) = 1 + 2_^^{Wj + l...Wi: = Wl...Wk-j}Q(^lQYll['^j + l ■ ■ ■'^k) X . 

i=i 
Proof, (i) We first deal with tlie words w sucli tliat 

pref(w) = (01)^00 or pFef (w;) = (Ol/l. 

Let us denote by pn tlie probability that T^ = n. Proceeding exactly in the 
same way as for Proposition 14.291 from the decomposition 

n-l 
Tr{w) =Pn + ^PzP {Xn-k+1 . . . X„ = w|T^^^ = z) , 

z=k 
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and due to the renewal property of the bamboo, one has 

n—k 



z=k 

n-l 



--k 

n-l 



z=n—k+l 



where suff (w) is the suffix of w equal to the reversed word of pref (w). Hence, for 
a; < 1, it comes 



k / \ +00 +00 n—k 



n=k n=k z=k 

+00 71—1 



'— 'w) 



n=fc z=n—k+l 

which leads to the expression of $«, (x) given in Proposition 14.291 The r^^ occur- 
rence can be derived exactly in the same way from the decomposition 

{w at n} = {Ti^^ =n}U {T^^^ =n}U...U {TJ;^^ = n} U {T^"^ < n and u; at n}. 

(ii) In the particular case of words w = *00(10)^1, the main difference is that 
the context 1 is not sufficient for the renewal property. The computation relies 
on the equality 

P {Xn-k+l ...Xn = W\T^^^ =Z)=V (X„_fc+i ...Xn = w\X,^2t-2 • • • X, = OO(lO)^l) 

The sketch of the proof remains the same replacing q^ci{w){w) by 5'i(oi)«oo(^)- 
The case w = *11(01)^ is analogous. ■ 

5 Some remarks, extensions and open problems 

5.1 Stationary measure for a general VLMC 

Infinite comb and bamboo blossom are two instructive but very particular ex- 
amples, close to renewal processes. Nevertheless, we think that an analogous of 
Proposition 14.231 or Proposition 14.321 can be written for a VLMC defined by a 
general context tree with a finite or denumerable number of infinite branches. 
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In order to generalize the proofs, it is clear that Formula ([S]) in Lemma 13.111 is 
crucial. In this formula, for a given finite word w = ai . . . a^ G W it is important 
to check whether the subwords pref (ai . . . Ofc), k < N, are internal nodes of the 
tree or not. Consequently, the following concept of minimal context is natural. 

Definition 5.35 (Minimal context) Define the following binary relation on 
the set of the finite contexts as follows: 

Vu, V G C^ , u -< V <^==^ 3w, w' G W, V = wuw' 

(in other words u is a sub-word of v). This relation is a partial order. In a 
context tree, a finite context is called minimal when it is minimal for this partial 
order on contexts. 

Remark 5.36 (Alternative definition of a minimal context) Let T be a 

context tree. Let c = aN • • • «i be a finite context of T . Then c is minimal if and 
onlyzfWke {l,...,A^-l},^ef(ai...afc) iC^iT). 

Example 5.37 In the infinite comb, the only minimal context isl. In the bamboo 
blossom, the minimal contexts are 1 and 00. 

Remark 5.38 There exist some trees with infinitely many infinite leaves and a 
finite number of minimal contexts. Take the infinite comb and at each O'^ branch 
another infinite comb. In such a tree, the finite leaf 10 is the only minimal context. 
Nonetheless, a tree with a finite number of infinite contexts has necessarily a finite 
number of minimal contexts. 

As one can see for the infinite comb or for the bamboo blossom (see Sections 14. 1.11 
and 14. 2. 1]) , minimal contexts play a special role in the computation of stationary 
probability measures. First of all, when tt is a stationary probability measure 
and w a finite word such that W ^ T, Formula ([8]) implies that 7r{w) is a rational 
monomial of the data qdct) and of the 7r{u) where u belongs to T. This shows 
that any stationary probability is determined by its values on the nodes of the 
context tree. In both examples, we compute these values as functions of the data 
and of the 7r(??7,), where m are minimal contexts, and we finally write a rectangular 
linear system satisfied by these 7T{m). Assuming that this system has maximal 
rank can be viewed as making an irreducibility condition for the Markov chain 
on C We conjecture that this situation happens in any case of VLMC. 
In the following example, we detail the above procedure, in order to understand 
how the two main principles (the partition (^ and the disjoint union) give the 
linear system leading to the irreducibility condition. 
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Example 5.39 Let T he a probabilized context tree corresponding to Figure 
(finite comb with n + 1 teeth). There are two minimal contexts: 1 and 0"'^^. 
Assume that ir is a stationary probability measure on C. Like in the case of the 




Figure 8: (n + l)-teeth comb probabilized context tree. 

infinite comb, the probability of a word that corresponds to a teeth is 7r(10'^) = 
n{l)ck, < k < n where c^ is the product defined by [W) . Also, the probabilities 
of the internal nodes and of the handle are 

7r(0'^) = 1 - 7r(l)^fe„i, 0<A:<n + l, 

where Sp := Yl^i=o'^j- ^V itT'^o.^s of these formulae, n is determined by vr(l). 

In order to compute 7r(l), one can proceed as follows. First, by the partition 

principle ^, we have 1 = 7r(0""'""'^) + 7r(l) Y12=o^k- Secondly, by disjoint union, 

7r(0"+^) = 7r(0"+2) + 7r(10"+^) = 7r(0"+i)go"+i(0) + 7r(10")goM(0). 

This implies the linear relation between both minimal contexts probabilities: 

7r(0"+i) + S„7r(l) = 1 
gon+i(l)7r(0"+i) - go"i(0)c„7r(l) = 0. 

In particular, this leads to the irreducibility condition qon+i{l)Sn + q'o"i(0)c„ 7^ 
for the VLCM to admit a stationary probability measure. One can check that this 
irreducibility condition is the classical one for the corresponding A-valued Markov 
chain of order n + 1. 

Example 5.40 Let T be a probabilized context tree corresponding to Figure 
(four flower bamboo). This tree provides another example of computation pro- 
cedure using Formulae ^ and ^, the partition principle ^ and the disjoint 
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union. This VLMC admits a unique stationary probability measure if the deter- 
minantal condition 

goo(l)[l + gi(0)] + gi(0) Vo(0) + gi(0)gi(l)goii(0) y^ 
is satisfied; it is fulfilled if none of the Qc is trivial. 

5.2 Tries 

In a first kind of problems, n words independently produced by a source are 
inserted in a trie. Tliere are results on tlie classical parameters of the trie (size, 
height, path length) for a dynamical source (Clement et al. [6|), which rely on the 
existence of a spectral gap for the underlying dynamical system. We would like 
to extend these results to cases when there is no spectral gap, as may be guessed 
in the infinite comb example. 

Another interesting application consists in producing a suffix trie from one se- 
quence coming from a VLMC dynamical source, and analyzing its parameters. 



For his analysis, Szpankowski (26| puts some mixing assumptions (called strong 
a-mixing) on the source. A first direction consists in trying to find the mixing 
type of a VLMC dynamical source. In a second direction, we plan to use the 
generating function for the occurrence of words to improve these results. 
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